Mon. Not. R. Astron. Soc. 000,[T][T7](2006) Printed 12 November 2009 (MN KOgX style file v2.2) 



Bayesian non-linear large scale structure inference of the Sloan 
Digital Sky Survey data release 7 

Jens Jasche l , Francisco S. Kitaura 2 , Cheng Li 1 ,Torsten A. EnBlin 1 

1 Max-Planck-lnstitut fur Astrophysik , Karl-Schwarzschild Strafie 1, D-85748 Garching, Germany 

2 SNS, Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126 Pisa, Italy 



Submitted to MNRAS 12-Nov-2009 



ABSTRACT 

In this work we present the first non-linear, non-Gaussian full Bayesian large scale structure 
analysis of the cosmic density field conducted so far. The density inference is based on the 
Sloan Digital Sky Survey data release 7, which covers the northern galactic cap. We em- 
ploy a novel Bayesian sampling algorithm, which enables us to explore the extremely high 
dimensional non-Gaussian, non-linear log-normal Poissonian posterior of the three dimen- 
sional density field conditional on the data. These techniques are efficiently implemented in 
the HADES computer algorithm and permit the precise recovery of poorly sampled objects 
and non-linear density fields. The non-linear density inference is performed on a 750 Mpc 
cube with roughly 3 Mpc grid-resolution, while accounting for systematic effects, introduced 
by survey geometry and selection function of the SDSS, and the correct treatment of a Poisso- 
nian shot noise contribution. Our high resolution results represent remarkably well the cosmic 
web structure of the cosmic density field. Filaments, voids and clusters are clearly visible. 
Further, we also conduct a dynamical web classification, and estimated the web type posterior 
distribution conditional on the SDSS data. 

Key words: large scale - reconstruction -Bayesian inference - cosmology - observations - 
methods - numerical 



1 INTRODUCTION 

Observations of the large scale structure have always attracted enor- 
mous interest, since they contain a wealth of information on the 
origin and evolution of our Universe. The details of structure for- 
mation are very complicated and involve many different physical 
disciplines ranging from quantum field theory, general relativity or 
modified gravity to the dynamics of collisionless matter and the 
behavior of the baryonic sector. Throughout cosmic history the in- 
terplay of these different physical phenomena therefore has left its 
imprints in the matter disuibution surrounding us. Probes of the 
large scale structure, such as large galaxy surveys, hence enable us 
to test current physical and cosmological theories and will gener- 
ally further our understanding of the Universe. 

Especially a cosmographical description of the matter distri- 
bution will permit us to study details of structure formation mech- 
anisms and the clustering behavior of galaxies as well as it will 
provide information on the initial fluctuations and large scale cos- 
mic flows. For this reason, several different methods to recover the 
three dimensional density or velocity field from galaxy observa- 
tions have been developed and applied to existing galaxy surveys 
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tion three dimensional Wiener reconstruction of the Sloan Digital 
Sky Survey data release 6 data, which demonstrated the feasibil- 
ity of high precision density field inference from galaxy redshift 
surveys. These three dimensional density maps are interesting for 
a variety of different scientific applications, such as studying the 
dependence of galaxy properties on their cosmic environment, in- 
creasing the detectability of the integrated Sachs-Wolfe effect in 



the CMB or performing constrained simulations (see e.g. Bistolas 



In particular, recently Kitaura et al. (20091 presented a high resolu 
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However, modern precision cosmology demands an increas- 
ing control of observational systematic and statistical uncertainties, 
and the means to propagate them to any finally inferred quantity in 
order not to draw wrong conclusion on the theoretical model to be 
tested. For this reason, here we present the first application of the 
new Bayesian large scale structure inference computer algorithm 
HADES (HAmiltonian Density Estimation and Sampling) to data 
(see |IaschlT & Kitaura 2009, for a description of the algorithm). 
HADES performs a full scale non-linear, non-Gaussian Markov 
Chain Monte Carlo analysis by drawing samples from the lognor- 
mal Poissonian posterior of the three dimensional density field con- 
ditional on the data. This extremely high dimensional posterior dis- 
tribution, consisting of ~ 10 6 or more free parameters, is explored 
via a numerically efficient Hamiltonian sampling scheme which 
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suppresses the random walk behavior of conventional Metropolis 
Hastings algorithms by following persistent trajectories through the 
parameter space ( |Duane et aL|1987t|Ne3TT993]|T996| >. The advan- 
tages of this method are manyfold. Beside correcting observational 
systematics introduced by survey geometry and selection effects, 
the exact treatment of the non-Gaussian behavior and structure of 
the Poissonian shot noise contribution of discrete galaxy distribu- 
tions, permits very accurate recovery of poorly sampled objects, 
such as voids and filaments. In addition, the lognormal prior has 
been demonstrated to be an adequate statistical description for the 
present density field and hence enables us to infer the cosmic den- 



sity field deep into the non-linear regime (see e.g. Hubbl e] ! 1934 



Peebles|1980"l|Coles & Jones|1991| [Gaztanaga & Yokoyam a|1993 



Kayo et al.|2001) . The important thing to remark about HADES 
is, that it does not only yield a single estimate, such as a mean, 
mode or variance, in fact it provides a sampled representation of 
the full non-Gaussian density posterior. This posterior encodes 
the full non-linear and non-Gaussian observational uncertainties, 
which can easily be propagated to any finally inferred quantity. 

The application of HADES to Sloan Digital Sky Survey 
(SDSS) data therefore is the first non-linear, non-Gaussian full 
Bayesian large scale structure analysis conducted so far (SDSS; 
|York et al.|20 00 ). In particular, we applied our method to the recent 
SDSS data release 7 (DR7) data (DR7; |Abazajian et al.|2009) , and 
produced about 3TB of valuable scientific information in the form 
of 40000 high resolution non-linear density samples. The density 
inference is conducted on an equidistant cubic grid with side length 
750 Mpc consisting of 256 3 volume elements. The recovered den- 
sity field clearly reveals the cosmic web structure, consisting of 
voids, filaments and clusters, of the large scale structure surround- 
ing us. 

These results provide the basis for forthcoming investigations 
on the clustering behavior of galaxies in relation to their large- 
scale environment. Such analyses yield valuable information about 
the formation and evolution of galaxies. In example, it has been 
known since long that physical properties such as morphological 
type, color, luminosity, spin parameter, star formation rate, concen- 
tration parameter, etc., are functions of the cosmic environment (see 
e.g.|Dressler|1980||Postman & Geller|1984||Whitmore et al.|1993| 
Lewis et al.|2002||G6mez et al.|2003||Goto et al.|2003||Rojas et al l 
Kuehn & Ryden | 2005 
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In this work we already conduct a preliminary examination 
of the dependence of stellar mass M* and g — r color of galaxies 
on their large-scale environment. However, more thorough inves- 
tigations will be presented in following works. Analyzing galaxy 
properties in the large-scale environment also requires to classify 
the large scale structure into different cosmic web types. We do 
so by following the dynamic cosmic web type classification pro- 
cedure as proposed by Ha hn et aT] {2007 1 with the extension of 
|Forero-Romero et al.| ( |2009| . The application of this procedure to 
our results yields the cosmic web type posterior, which provides 
the probability of finding a certain web type (void, sheet, filament, 
halo) at a given position in the volume conditional on the SDSS 
data. This permits simple propagation of all observational uncer- 
tainties to the final analysis of galaxy properties. 

The paper is structured as follows. We start by a brief review 
of the methodology in section [2] particularly describing the log- 
normal Poissonian posterior and the Bayesian computer algorithm 
HADES. Additionally, here we describe the dynamic web classi- 
fication procedure as mentioned above. In section [3] we give a de- 



scription of the SDSS DR7 data and present necessary data prepa- 
ration steps required to apply the analysis procedure. Specifically, 
we describe the preparation of the linear observation response op- 
erator and the creation of the three dimensional data cube. In the 
following section [4] we present the results obtained from the non- 
linear, non-Gaussian sampling procedure. We start by analyzing the 
convergence behavior of the Markov chain via a Gelman & Rubin 
diagnostic, followed by a discussion of the properties of individ- 
ual Hamiltonian samples. Here we also provide estimates for the 
ensemble mean density field and according variance. These fields 
depict remarkable well the properties of the cosmic web consisting 
of voids, filaments and halos. Based on these results we perform 
a simple cosmic web classification in section [5] In section [6] we 
present a preliminary examination on the correlation between the 
large-scale environment of galaxies and their physical properties. 
In particular, here we study the stellar mass and g—r color of galax- 
ies in relation with the density contrast 6. We conclude the paper in 
section[7]by summarizing and discussing the results. 



2 METHODOLOGY 

In this section we give a brief review of the methods used for the 
large scale structure inference. In particular, we discuss the lognor- 
mal Poissonian posterior, and the according data model. Further, 
we give a description of the HADES algorithm and a dynamic cos- 
mic web classification procedure. 



2.1 Lognormal Poissonian posterior 

Precision inference of the large scale structure in the mildly and 
strongly non-linear regime requires detailed treatment of the non- 
Gaussian behavior of the large scale structure posterior. Although, 
the exact probability distribution for the density field in these 
regimes is not known, for a long time already it has been suggested 
that the fully evolved non-linear matter field can be well described 
by lognormal statistics (see e.g. Hubble;il934; Peebles 1980, |Coies] 
|& Jonesp99Tj |Gaztanaga & Yokoyama||1993||Kayo et al.|2001^ 
This phenomenological guess has been justified by the theoretical 
considerations of Coles & Jones (1991). They argue that assum- 
ing Gaussian initial conditions in the density and velocity distri- 
butions will lead to a log-normally distributed density field. It is a 
direct consequence of the continuity equation or the conservation 
of mass. In addition, the validity of the lognormal distribution as a 
description of the statistical properties of non-linear density fields 
has been evaluated in Kayo et al. (2001 1. In this work, they studied 
the probability distribution of cosmological non-linear density fluc- 
tuations from N-body simulations with Gaussian initial conditions. 
They found that the lognormal distribution accurately describes the 
non-linear density field even up to values of the density contrast of 
S ~ 100. In addition, recently Ki taura et al.| l(2009 1 analyzed the sta- 
tistical properties of the SDSS DR6 Wiener reconstructed density 
field, and confirmed a lognormal behavior. 

For all these reasons, we believe, that the statistical behavior of 
the non-linear density field can be well described by a multivariate 
lognormal distribution, as given by: 



^2ndet(Q) 



,(1) 



where s, is the density signal at the three dimensional cartesian 
position x h Q is the covariance matrix of the lognormal distribution 
and yu, describes a constant mean field given by: 
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This probability distribution, seems to be an adequate prior choice 
for reconstructing the present density field. 

Studying the actual matter distribution of the Universe re- 
quires to draw inference from some observable tracer particle, such 
as a set of observed galaxies. Assuming galaxies to be discrete par- 
ticles, their distribution can be described as a specific realization 
drawn from an inhomogeneous Poisson process (see e.g. |Layzer| 
|1956[|Peebles|1980||Martmez & S aar 20021. The according proba- 
bility distribution is given as: 



P({N* k }\{A k }) = f] 



(3) 



where N? is the observed galaxy number at position x k in the sky 
and A k is the expected number of galaxies at this position. The mean 
galaxy number is related to the signal s k via: 



A k = R k N(l + B(s) k ) , 



(4) 



where R k is a linear response operator, incorporating survey geome- 
tries and selection effects, is the mean number of galaxies in the 
volume and B{x) k is a non-linear, non local, bias operator at posi- 
tion x k . The lognormal prior given in equation |TJ together with the 
Poissonian likelihood given in equation l[3} yields the lognormal 
Poissonian posterior, for the density contrast s k given some galaxy 



observations Nf 
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It is important to note, that this is a highly non-Gaussian distribu- 
tion, and non-linear reconstruction methods are required in order 
to perform accurate matter field reconstructions in the non-linear 
regime. In example, estimating the maximum a posteriori values 
from the lognormal Poissonian distribution involves the solution of 
implicit equations. Several attempts to use a lognormal Poissonian 
posterior for density inference have been presented in literature. 
These attempts date back to |Sheth| ^995 ) who proposed to use a 
variable transformation in order to derive a generalized Wiener fil- 
ter for the lognormal distribution. This approach, however, yielded 
a very complex form for the noise covariance matrix making ap- 
plications to real data sets impractical. The first successful applica- 
tion of the lognormal Poissonian distribution for density inference 
was presented by Saunder s et al.| l |2000) . Their method is based 
on the expansion of the density logarithm into spherical harmon- 
ics (Saunders & Ballinger 2000). More accurate schemes based on 
maximum and mean posteriori principles were derived by ( EnBlin 
|et al.|20 08 1. Recently, an implementation of the maximum a poste- 
riori scheme was presented and thoroughly tested by iKita ura et al.| 
|2009| >. They found that, assuming a linear bias, the lognormal Pois- 
sonian posterior permits recovery of the density field deep in the 
nonlinear regime up to values S > 1000 of the density contrast. Fi- 
nally, Jasche & Kitaura ( 2009 1 developed the Hamiltonian density 
estimation and sampling scheme to map out the posterior probabil- 
ity distribution. 
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Figure 1. Results of the Gelman & Rubin convergence diagnostic. The 
PSRF indicates convergence. As can be seen the Gelman & Rubin test con- 
verges faster in regions with good data. 



2.2 HADES 

As already described above, Bayesian non-linear large scale struc- 
ture inference requires to sample from non-Gaussian posterior dis- 
tributions. In order to do so, we developed HADES (see Jasche 
|et al.|2009| for more details). HADES explores the very high di- 
mensional parameter space of the three dimensional density field 
via an Hamiltonian Monte Carlo (HMC) sampling scheme. Unlike 
conventional Metropolis Hastings algorithms, which move through 
the parameter space by a random walk, and therefore require pro- 
hibitive amounts of steps to explore high dimensional spaces, the 
HMC sampler suppresses random walk behavior by introducing 
a persistent motion of the Markov chain through the parameter 
space ( |Duane et allp987| [Neal][T993l [7996) . In this fashion, the 
HMC sampler maintains a reasonable efficiency even for high di- 
mensional problems (Hanson 2001). Since it is a fully Bayesian 
method, the scientific output is not a single estimate, but a sam- 
pled representation of the multidimensional lognormal Poissonian 
posterior distribution given in equation {3}. Given this representa- 
tion of the posterior any desired statistical summary, such as mean, 
mode or variances can easily be calculated. Further, any uncertainty 
can seamlessly be propagated to the finally estimated quantities, by 
simply applying the according estimation procedure to all Hamil- 
tonian samples. For a detailed description of the theory behind the 
large scale structure sampler and its numerical implementation see 
|Jasche et al.1p009) . 



2.3 Classification of the cosmic web 

The results generated by the Hamiltonian sampler HADES will per- 
mit a variety of scientific analyses of the large scale structure in the 
observed Universe. An interesting example is to classify the cos- 
mic web, in particular identifying different types of structures in the 
density field. Such an analysis, in example, is valuable for studying 
the environmental dependence of galaxy formation and evolution 
(see e.g. jLee & Le e 2008 , Lee & Li 2008 1. Since the structure clas- 
sification is not always unique, we provide the full Bayesian poste- 
rior distribution of the structure type at a given position conditional 
on the observations. 

However, to do so we first need a means to identify differ- 
ent structure types from the density field. Numerous methods for 
structure analysis have been presented in literature (see e.g. |Lem- 
son & Kauffmann 



1999, Colberg et al 



2005 



2007 



Novikov et al. 2006 
Colberg et al|2008 



|Hahn et al.|2007||Arag6n-Calvo et al. 

|Forero-Romero et al.|2009| . In principle, all of these methods can 
be applied for the analysis of the Hamiltonian samples, however 
for the purpose of this paper we follow the dynamical cosmic web 
classification procedure as proposed by H ahn et aT] p007 l. They 
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Figure 2. Three different slices from different sides through density fields. Left panels show slices through one of the 40000 density sample, middle panels 
depict the estimated ensemble mean and right panels demonstrate the according slices through the three dimensional response operator R,. It can be seen that 
the density sample (left panels) possesses equal power throughout the entire domain, even in the unobserved regions. 



propose to classify the large scale structure environment into four 
web types (voids, sheets, filaments and halos) based on a local- 
stability criterion for the orbits of test particles. The basic idea of 
this dynamical classification approach is that the eigenvalues of the 
deformation tensor characterize the geometrical properties of each 
point in space. The deformation tensor T, ; is given by the Hessian 
of the gravitational potential O: 



'' dxjdxj ' ^ 

with O being the rescaled gravitational potential given as (see 
|Forero -Romero et al. 2009): 

V 2 = <5. (7) 

It is important to note, that the deformation tensor, and the rescaled 
gravitational potential are both physical quantities, and hence their 
calculation requires the availability of a full physical density field 
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Structure type 


rule 


Void 


A\ , A2, A3 < A^ 


Sheet 


A i > A t i, and /l2,/l3 < A,i, 


Filament 


A\ , /1 2 > A t h and A3 < A^ 


Halo 


Ai,A2,A^ > A t h 



Table 1. Rules for the dynamic classification of web types. 

in contrast to a smoothed mean reconstruction of the density field. 
As was already mentioned above, and will be clarified in section 
|4.2| the Hamiltonian samples provide such required full physical 
density fields. The deformation tensor can therefore easily be cal- 
culated for each Hamiltonian sample from the Fourier space rep- 
resentation of equation d6). Each spatial point can then be classi- 
fied as a specific web type by considering the three eigenvalues, 
A] > A 2 > A 3 , of the deformation tensor. Namely, a void point 
corresponds to no positive eigenvalue, a sheet to one, a filament 
to two and a halo to three positive eigenvalues l |Forero-Romero| 
|et al.|20 09l. The interpretation of this rule is straight forward, as the 
sign of a given eigenvalue at a given position defines, whether the 
gravitational force at the direction of the principal direction of the 
corresponding eigenvector is contracting (positive eigenvalues) or 
expanding (negative eigenvalues). However, |Forero-Ro mero et al. 
( 2009 1 found that rather than using a threshold value A,/, of zero dif- 
ferent positive values can yield better web classifications. For this 
reason, in this work, we use the extended classification procedure 
as proposed by |Forero-Romero~et al . ( 2009 1. The structures are then 
classified according to the rules given in table [T] By applying this 
classification procedure to all Hamiltonian samples we are able to 
estimate the web type posterior y({T;(J4)}|{iV?}, A,h) of four differ- 
ent web types (Tr(^) = void, T 2 (;4) = sheet, T 3 (x t .) = filament, 
Tii^k) = halo) conditional on the observations and the threshold 
criterion A,/,. 



3 DATA 

In this section we describe the SDSS galaxy sample used for the 
analysis. Additionally, we discuss the data preparation steps re- 
quired to perform the three dimensional density inference proce- 
dure. 

3.1 The SDSS galaxy sample 

We use data from Sample dr72 of the New York University Value 
Added Catalogue (N YU-VAGC) [] This is an update of the cata- 
logue constructed by Blant on et al.| ( (20 05) and is based on the fi- 
nal data release (D R7;|Abazajian et al.|2009| l of the Sloan Digital 
Sky Survey (SDSS; |York"et al.|2000) . Starting from Sample dr72, 
we construct a magnitude-limited sample of galaxies with spectro- 
scopically measured redshifts in the range 0.001 < z < 0.4, r-band 
Petrosian apparent magnitude r < 17.6 after correction for Galac- 
tic extinction, and r-band absolute magnitude -23 < Mo.i r < -17. 
Here Mo.i,. is corrected to its z = 0.1 value using the ^-correction 
code of Bl anton et al.H2003f and |Blanton"& Roweis (2007) and the 
luminosity evolution model of Blanton et al. ( 2003 ). The apparent 
magnitude limit is chosen in order to get a sample that is uniform 

1 http://sdss.physics.nyu.edu/vagc/ 
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Figure 3. Power-spectra of some Hamiltonian samples. The black curve 
corresponds to a linear ACDM power-spectrum. 

and complete over the entire area of the survey. We also restrict our- 
selves to galaxies located in the main contiguous area of the survey 
in the northern Galactic cap, excluding the three survey strips in 
the southern cap (about 10 per cent of the full survey area). In addi- 
tion, we consider only galaxies which are inside a comoving cube 
of side length 750 Mpc. These restrictions result in a final sample 
of 463,230 galaxies. 

The NYU-VAGC also provides the necessary information to 
correct for incompleteness in our spectroscopic sample. This in- 
cludes two parts: a mask which shows which areas of the sky have 
been targeted and which have not, and a radial selection function 
which gives the fraction of galaxies in the absolute magnitude range 
being considered that are within the apparent magnitude range of 
the sample at a given redshift. The mask defines the effective area 
of the survey on the sky, which is 6437 deg 2 for the sample we 
use here. This survey area is divided into a large number of smaller 
subareas, called polygons, for each of which the NYU-VAGC lists 
a spectroscopic completeness, defined as the fraction of photomet- 
rically identified target galaxies in the polygon for which usable 
spectra were obtained. Over our sample the average completeness 
is 0.92. 

3.2 Completeness and selection function 

Three dimensional density field inference requires the definition of 
the linear observation response operator R k , as given in section [2~7] 
This response operator describes to what percentage each volume 
element of the three dimensional domain has been observed. It is 
hence a projection of the product of radial and angular selection 
function into the three dimensional voxelized space. In particular, 
we have to solve the convolution integral: 

R k = R(x k ) = fd?W(2 k -f)f (r(f)) M (a(f), 6(f)) , (8) 

where W(x) is the voxel kernel, f(r) is the radial selection func- 
tion, with r being the distance from the observer and M(a, 6) is 
the angular selection function, where a and S are right ascension 
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and declination respectively. We evaluate this integral numerically 
for the nearest grid point kernel by following different line of sights 
and calculating the contribution of the product of angular and radial 
selection function to each voxel. 

As mentioned above, in this work we used the two dimen- 
sional sky mask and the radial selection function provided by the 
NYU-VAGC. 



3.3 Creating the three dimensional data cube 

The large scale structure sampler operates on a three dimensional 
equidistant grid. In particular, in this work we set up a cubic grid 
with side length 750Mpc and 256 3 voxels. This amounts to a res- 
olution of ~ 3Mpc voxel side length. Since our algorithm relies 
on the correlation function in comoving space, all calculations are 
performed with comoving length units rather than with redshift dis- 
tances. For this reason, we transform all galaxy redshifts z to co- 
moving distances via the relation: 



f'dz- 
Jo c 



1 



H(z) 



(9) 



where z, is the estimated galaxy redshift, c is the speed of light and 
H(z) is the Hubble parameter given as: 



ff(z) = h q Vn,„(i+z) 3 +fic(i+z) 2 + fiA 



(10) 



Further, we choose a concordance ACDM model with a set of cos- 
mological parameters (£1„, = 0.24 , Q. c = 0.00 , Q. A = 0.76 , h = 
0.73, H Q = h lOOkm/s/Mpc) ( |Spergel et aE][2057) . With these 
definitions we can calculate the three dimensional cartesian coordi- 
nates for each galaxy as: 



x = r cos((5) cos(d') 
y = r cos((5) sin(a) 
z = r sin((5) 



(11) 



where a and 5 are the right ascension and declination respectively. 
We then sort the galaxy distribution into the three dimensional 
equidistant grid via a nearest grid point procedure (see e.g. |Hock^| 
|ney & Eastwood 1988). An estimate for the expected number of 
galaxies N can then be calculated as: 



N ■■ 



(12) 



(see e.g. |Kitaura et al.|2009||Jasche et al.|2009| for details). 



3.4 Physical model 

Observations of the galaxy redshifts do not permit direct inference 
of the underlying matter distribution. Various physical effects such 
as galaxy biasing and redshift space distortions must be taken into 
account for proper analyses. This is of particular relevance for the 
choice of power-spectrum required for the sampling procedure (see 
equat ion ([!} ). However, according to the discussion in Erdogdu 
|et al.H2004) and |Kitaura et al.| p009 ) these effects can be greately 
accounted for in a separate postprocessing step, once the contin- 
uous expected galaxy density field in redshift space has been ob- 
tained. For this reason, here we seek to recover the density field in 
redshift space permitting us to test various bias models and redshift 
space distortions correction methods in a subsequent step. 

In particular, the relation between the true underlying dark 
matter density field and the expected continuous galaxy density 
contrast is generally very complicated and involves non-local and 



non-linear bias operators. Several non-local bias models have been 
presented, which mostly aim at correcting the large scale power 
in power-spectrum estimation procedures (s ee e.g. | Tegmark et al.| 
[2004)[Sepc|2000[|Peacock & Smith|2000) |Hamann et al.|2008| >. 
As described in section[2|and |2.2| the Hamiltonian sampler is able 
to account for such bias models. Note however, that a specific bias 
model also fixes the model for the underlying dark matter distribu- 
tion. Therefore, here, we prefer to follow the approach of previous 
works of setting the bias operator to a constant linear factor equal 
to unity ( Erdogdu et al. 2004 , Ki taura et al.|2009") . In this fashion, 
one obtains the expected continuous galaxy density contrast. As 
discussed in Kit aura et al.| ( |206"9| >, the according underlying dark 
matter distribution can then be simply obtained by deconvolving 
the results with a specific scale dependent bias model, permitting 
us to explore various different bias models. 

In a similar manner, one can treat redshift-space distortion ef- 
fects. These are mainly due to the peculiar velocities of galaxies, 
which introduces Doppler effects in the redshift measurement (see 
e.g.|Kaiser|1987||Peacock & Dodds|19^|Hamilton|1998|[DavTs&l 
|Peebles|1983^ . This effect leads to a radial smearing of the observed 
density field in redshift-space and yields elongated structures along 
the line of sight, the so called finger-of-God effect. 

Additionally, there exists a cosmological redshift-space effect 
which is sensitive to the global geometry of the Universe. In par- 
ticular, the comoving separation of a pair of galaxies at z » 0. 1 is 
not determined only by their observable angular and redshift sep- 
arations without specifying the geometry, or equivalently the mat- 
ter content of the Universe (Magira et al. 2000). This effect yields 
anisotropies in the matter distribution especially at z > 1 (see e.g. 



Alcock & Paczynsk?fT979lpvlatsubara &Suto|199 6, BallingeFetaT] 



1996||Popowski et al.|1998| . However, for the volume considered 
in this work (z < 0.27), the dominant redshift-space distortions 
are due to non-linear peculiar motions of galaxies in large over- 
densities. This effect has pronounced consequences for the power- 
spectrum in redshift-space, since it suppresses power on small 
scales. As demonstrated in Erdogdu " et al.| ( |2004) , the redshift- 
space power-spectrum of a fully evolved non-linear matter distri- 
bution is very similar to a linear power-spectrum at the scales rel- 
evant for this work (k < 2 h/Mpc). Here, they used the non-linear 
power-spectrum fitting formula provided by |Smith et al.| l |2003| >. 
However, the exact galaxy power- spectrum in redshift-space is not 
known. The work of Tegmark et al. ( 2006 1 indicates that the recov- 
ered power-spectrum of the SDSS main sample is close to a linear 
power-spectrum, which may be due to the fact that this galaxy sam- 
ple is not strongly clustered. In this case, the redshift-space power- 
spectrum would be even closer to a linear power-spectrum. In any 
case assuming a linear power-spectrum will still permit physically 
accurate matter field inference in redshift-space (Erdogdu et al. 
2004|. For this reason, in the absence of more precise informa- 
tion on the galaxy power-spectrum in redshift-space, here we will 
assume a linear power-spectrum, calculated according to the pre- 
scription provided by Eise nstein & Hu|(l998} and Eisenstein & Hu| 
l |1999[ l. One should also bear in mind that the data itself will gov- 
ern the inference process. For this reason, power-spectra measured 
from the Hamiltonian samples will only be partially defined by the 
a priori power-spectrum guess but mostly by the data. However, 
we defer a more careful treatment of all physical effects including 
a joint inference of density field and power-spectrum to a future 
work. 

It is clear, that precise correction of these redshift-space effects 
requires knowledge about the peculiar velocities of all observed 
galaxies, which is usually not provided by galaxy redshift surveys. 
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Figure 4. Three different slices from different sides through ensemble mean density (left panels), ensemble variance (middle panels) and the three dimensional 
response operator R, (right panels). Especially the variance plots demonstrate, that the method accounted for the full Poisonian noise structure introduced by 
the galaxy sample. One can also see the correlation between high density regions and high variance regions, as expected for Poissonian noise. 



Therefore, precise correction of redshift-space distortions is very 
complicated and subject to ongoing research. In the linear regime, 
the theory behind the observed redshift-space distortions is well 
developed (Kaiser 1987, Hamilton 1998). However, in quasi-linear 
and non-linear regimes, we instead have to resort to making approx- 
imations or using fitting formulae based on numerical simulations 
( |Percival &^ White 2009 1. Literature provides numerous approaches 
to alleviate these redshift-space distortions particularly in power- 
spectrum estimation. Most of these approaches aim at restoring the 



correct power by deconvolution with an redshift-space convolution 
kernel which takes into account the random pair velocities of galax- 
ies in collapsed objects (see e.g Peacock & Dodds 1994, Ballinger 
et al.|1996||Jing et al.|1998[|Hamilton|1998||Kang et al.|2002||Jing 
& Borner|2004||Erdogdu et al.|2004||scoccimarro|2004[|Cabre & 
Gaztanaga 2009, Percival & White 2009). Such techniques have 
been adopted to correct Wiener density reconstructions by apply- 
ing a redshift distortion operator to the final result, in order to re- 
store the correct power < |Erdogdu et al.|2004| |Kitaura et aL|2 009 ) . 
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Figure 5. The plot shows the relative density to variance ratio oij. In com- 
parison to the lower panels of figure [4] it indicates a high signal-to-noise 
ratio in regions of high density as expected for Poissonian noise. 



However, it must be noted that this method does not account for 
the correction of phase information, and therefore only corrects the 
two-point statistics of the recovered density field. 

Three dimensional density inference hence requires redshift- 
space distortions corrections which also account for phase informa- 
tion and would be dependent on the density or gravitational poten- 
tial. In the linear regime it is possible to apply an inverse redshift- 
space operator which transforms the redshift-space density to its 
real-space counterpart ( Taylor & Valentine 1999; D'Mellow & Tay- 
|lor|2000) . However, it does not account for the strongly non-linear 
regime which mostly generates the finger-of-God effect. For this 
reason Te gmark et al.|{2~004[ > proposed & finger-of-God compres- 
sion method. Here they use a standard friends-of-friends algorithm 
to identify a cluster by taking into account different density thresh- 
olds, which set the linking length. They then measure the disper- 
sion of galaxies about the cluster center along the line of sight and 
in transverse direction. If the radial dispersion exceeds the trans- 
verse dispersion, the cluster is compressed radially until the radial 
dispersion equals the transverse dispersion (Te gmark et al.|2004| >. 
However, it is not clear to what degree such a method would falsely 
isotropize filaments or under dense objects along the line of sight to 
spherical clusters. Such a method of isotropizing the density field, 
however, can also be applied in a post processing step, by noting 
that a density threshold refers to a linking length in the friends-of- 
friends algorithm. 

Nevertheless, the above correction methods mask the fact that 
redshift-space distortions introduce statistical uncertainties. Thus 
unique recovery of the real-space density field is generally not pos- 
sible. A full characterization of the joint uncertainties of the real- 
space density hence would require to carefully take into account the 
uncertainties introduced by redshift-space distortions or the lack of 
knowledge on peculiar velocities. This can be achieved by intro- 
ducing a density dependent peculiar velocity sampling scheme to 
our method, as proposed by Kitaura & EnBlin ( 2008 ). However, we 
defer sampling of the peculiar velocities to a future work. 



4 RESULTS 

In this section we describe the results obtained from the large scale 
structure inference procedure. 



4.1 Convergence test 

HADES is a Markov Chain Monte Carlo sampler and hence we 
have to test, if the individual Hamiltonian samples really repre- 
sent the lognormal Poissonian posterior. Convergence diagnostic 
of Markov chains is subject of many discussions in literature (see 



e.g.|Heidelberger & Welch|1981||Gelman & Rubin|1992||Geweke| 
|1992||Raftery & Lewis|1995||Cowles & Carlin|1996||Hanson|200Tf 



Dunkley et al. 2005 1. However, here we apply the widely used Gel- 
man & Rubin diagnostic, which is based on multiple simulated 
chains by comparing the variances within each chain and the vari- 
ance between chains (Gelman & Rubin 1992 ). In particular, we cal- 
culate the potential scale reduction factor (PSRF) (see |Jasche & Ki-| 
|taura|2009| >. A large PSRF indicates that the inter chain variance is 
substantially greater than the intra chain variance and longer chains 
are required. Once the PSRF approaches unity, one can conclude 
that each chain has reached the target distribution. We calculated 
the PSRF for each voxel in our calculation domain. The result for 
this test is presented in figure [I] It indicates convergence of the 
Markov chain. However, it can be seen that some regions of the do- 
main converge faster than others. This is due to the fact, that not all 
regions of the cubical volume have been observed equally. Regions 
which contain good observations converge faster, since there the 
probability distribution is narrower, while poorly or non observed 
regions converge slower, since the space of possible solutions is 
larger. Also note, that the Gelman & Rubin diagnostic is gener- 
ally a conservative test, and other tests might indicate convergence 
much earlier. However, this test clearly demonstrates that the qual- 
ity and amount of observational data can have a strong impact on 
the convergence behavior of the chain. 



4.2 Hamiltonian samples 

Since the Markov chain converges we can conclude, that the in- 
dividual samples are really samples from the large scale structure 
posterior. At this point it is important to insist that the Hamilto- 
nian samples are not the result of a filtering procedure. A filter 
generally suppresses the signal in low-signal to noise regions, and 
therefore produces biased estimates for the physical density field. 
This is not the case for the individual Hamiltonian samples. Since 
they are random realizations of the lognormal Poissonian posterior, 
they are unbiased density fields in the sense that they possess cor- 
rect physical power throughout the entire cubical volume. As an 
example we present slices through an arbitrary density sample in 
figure [2] Already visually, one has the impression, that the den- 
sity field has equal power throughout the entire domain, even in 
the unobserved regions. This is because the Hamiltonian sampler 
non-linearly augments the poorly or not observed regions with sta- 
tistically correct information. Each density sample therefore is a 
proper physical density field, from which physical quantities can 
be derived. To demonstrate this, we measure the power-spectra of 
some of these Hamiltonian samples. The result is presented in fig- 
ure]^] As can be seen, the power-spectra of the individual samples, 
are very close to the assumed linear ACDM power-spectrum. The 
deviations at large scales and small scales are due to the impact 
of the data. At small scales the deviation can be explained by red- 
shift space distortions, while at the largest scales cosmic variance 
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Figure 6. Commulative probability distributions of the density at randomly 
chosen points in the volume. The cumulative probability distributions have 
been evaluated for 20 threshold values <$,/,. The two horizontal lines indicate 
the P (<5 < 6 t h) = 0.5 and 0.9 thresholds respectively. 



is dominant. There is clearly no sign of artificial power loss due to 
the survey geometry. Since the individual samples are valid density 
field realizations, it is easy to derive meaningful physical quanti- 
ties, such as the gravitational potential, cosmic flows or the tidal 
shear tensor as demonstrated in the remainder of this paper. 



4.3 Ensemble mean and variance 

Here we want to present the ensemble mean and variance for the set 
of 40000 Hamiltonian samples, each consisting of 256 3 voxels. For 
comparison with a single density sample the middle panels of fig- 
ure|2]show the according slices through the ensemble mean density 
field, which exhibits many interesting features. First of all, it ren- 
ders remarkably well the filamentary structure of our cosmic neigh- 
borhood. Many clusters, filaments and voids can clearly be seen by 
visual inspection. In the unobserved regions the ensemble mean 
density amplitudes drop to the cosmic mean for the density con- 
trast 6 = 0, just as required by construction. Structures close to the 
observer, at cartesian coordinates (0, 0, 0), are more clearly visible 
than structures at larger distances. Especially, filaments and voids 
are less prominent at larger distances. This is due to the observa- 
tional response operator Rj, which due to the radial selection func- 
tion drops to very low values at large distances. Therefore, once a 
galaxy is detected far away from the observer, it must reside inside 
a large overdensity, and hence inside a cluster. This expectation is 
clearly represented by the ensemble mean density field. Another 
interesting point to remark is, that the borders to the unobserved re- 
gions are not very sharp. Some of the observed information is non- 
linearly propagated into the unobserved regions, since our method 
takes into account the correlation structure of the underlying signal. 
It can therefore be seen, that some clusters and voids are interpo- 
lated further out into the unobserved regions. In comparison to the 
Wiener filter as previously applied to SDSS data by Kitau ra et al.| 
(2009 ), it seems that the Hamiltonian sampler is more conservative 
and less optimistic for the extrapolation of information into the un- 
observed region. This may be due to the fact, that here we take into 
account the full Poissonian noise statistics rather than restricting 
the noise to a Gaussian approximation. Beside the ensemble mean, 



here we also calculate the ensemble variance per voxel, which is 
the diagonal of the full ensemble covariance matrix. Some slices 
through the ensemble mean, ensemble variance and the according 
slices through the observational response operator are presented 
in figure [4] Here the middle panels correspond to ensemble vari- 
ance. At first glance, one can nicely see the Poissonian nature of 
the galaxy shot noise. High density peaks in the ensemble mean 
map correspond to high variance regions in the ensemble variance 
map, as expected for Poissonian noise. One can clearly see that 
the Hamiltonian sampler took into account the full three dimen- 
sional noise structure of the galaxy distribution. Additionally, with 
larger distance to the observer, the average variance increases, as is 
expected due to the radial selection function. It is also interesting 
to remark, that some voids have been detected with quite low vari- 
ance, hence with high confidence. Note, however, although here we 
only plotted the diagonal of the density covariance matrix, the full 
non-diagonal covariance structure is completely encoded in the set 
of Hamiltonian samples, and can be taken into account for future 
analysis. Also note, that the variance slices show high variances in 
regions where many galaxies have been observed. This is a key fea- 
ture of the Poisson statistics, because the standard deviation is equal 
to the square-root of the number of individual galaxies. That is, if 
there are N galaxies in each voxel, the mean is equal to N and the 
standard deviation is equal to y/N. This makes the signal-to-noise 
ratio equal to y/N for such an homogeneous case. To emphasize the 
fact, that regions which show high variances have also high signal- 
to-noise ratios, we calculate the density to variance ratio: 

(1 + (St)) 

(13) 



CO, 



y[($) - (Si) 2 



The result of this calculation is presented in figure [5] for the case 
of the lower slices of figure [4] It clearly indicates high signal-to- 
noise ratios in high density regions. In addition, we also estimate 
the cumulative probabilities P ((5, < 6,/,), at twenty different density 
threshold values £,/,, for the density found at each voxel. This cu- 
mulative probabilities are estimated from the Hamiltonian samples 
by: 



P(Si<S th ). 



' &(S lh - Si) 



(14) 



where n labels the individual Hamiltonian samples, N simp is the to- 
tal number of samples and O(x) is the Heaviside function. These 
cumulative probabilities allow for example to estimate the median 
density at each voxel, and can be usefull, when analyzing galaxies 
in their cosmic environment as will be done in a following project. 
Some such cumulative probability distributions, chosen randomly, 
are shown in figure [6] As can be seen, the recovered density am- 
plitudes extend over a large range, from small linear to very high 
non-linear values. 



5 WEB CLASSIFICATION 

Already in the introduction we mentioned that the results presented 
in section [4] are to be used for analyzing galaxy properties in the 
large-scale environment in a future work. Such analyses also re- 
quire the classification of the large scale density field into differ- 
ent web type objects. Therefore, in order to characterize the en- 
vironment of our SDSS galaxy population, here we apply the dy- 
namic web classification procedure, as described in section |2~3) to 
the set of Hamiltonian samples. A similar analysis has been previ- 
ously carried out by Lee & Erdogdu (2007) and |Lee & Lee| ( 2008) 
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Figure 7. Ensemble mean of the eigenvalues of the deformation tensor. 



based on a Wiener mean density reconstructions of the 2MASS 
Redshift survey to study alignments of galaxy spins with the tidal 
field and the variation of galaxy morphological type with environ- 
mental shear. 

Here we will follow a similar procedure to classify each indi- 
vidual voxel of a given Hamiltonian sample as one of the four web 
types T,-, with the different types being T] = void, T 2 = sheet, T 3 = 
filament, T4 = halo. To do so, we perform the following three steps 
for an individual Hamiltonian sample: 

(i) Solve equation {6]l for the deformation tensor Tjj by means 
of Fast Fourier Transform techniques 

(ii) Solve the cubic characteristic equation for the three eigen- 
values of the deformation tensor at each spatial position 

(iii) Apply the rules given in table[T|to classify the web type at 
each spatial position for a given threshold value A lh . 

The result of this procedure for the nth sample is then a unit four 
vector T n 04) at each voxel position £ k . All of the entries of this 
four vector are zero except for one, which indicates the web type. 
Applying the above method to all Hamiltonian samples will yield 
a set of classification four vectors, which encodes the information 
and uncertainty of the observations. Additionally, as an intermedi- 
ate result, we obtain the set of the three eigenvalues for each in- 
dividual Hamiltonian sample. Slices through their ensemble mean 
estimates are presented in figure [7] 

However, rather than summarizing the results in terms of mean 
and variance here we want to estimate the full cosmic web poste- 
rior. This is achieved by counting the relative frequencies for each 
web type at each individual spatial coordinate within the set of 
Hamiltonian samples. With these definitions we yield the cosmic 
web posterior for each web type as: 



Zj„=1 ^j=l T,(.? t )T' 



(4) 



AL 



(15) 



where n labels the individual Hamiltonian samples, ALmp is the to- 
tal number of samples and Sf- is the Kronecker delta. The cosmic 
web posterior incorporates all observational information and uncer- 
tainties, and enables us to determine how well different structures 
can be classified with respect to observational uncertainties. 

We evaluate the cosmic web posterior for four different val- 
ues of A.,h, with A,i, = 0.0,0.2,0.4, 1.0. Slices through the cosmic 
web posteriors for the four different cases are presented in figure 
[8] It can be clearly seen, that the properties of the survey geometry 



are represented by the four posterior distributions. While the web 
classification in the observed regions clearly follows the structure 
of the underlying density field, it obviously can not provide a clear 
classification of unobserved regions. Also with distance to the ob- 
server, the web classification becomes more and more uncertain. 
In this fashion, the cosmic web posterior renders the uncertain- 
ties introduced by the radial selection function and the resulting 
higher shot noise contribution at larger distances. The impact of 
the A, h threshold can be observed when comparing the four cosmic 
web posteriors. In the case of A lh = 0.0 the cosmic web consists 
of many small isolated voids, which occupy only a small fraction 
of the total area of the slice. With increasing threshold A th , voids 
become bigger and more connected until they completely domi- 
nate the cosmic web for the case A t j, = 1.0. The opposite behavior 
can be observed in case of the halo posteriors, as the number of 
clearly detected halos declines with increasing threshold A, h . Fol- 
lowing Forero-Romero et al. ( 2009!), we also calculate the volume 
occupied by each web type (the volume filling fraction - VFF) and 
the fraction of mass contained in such a volume (mass filling frac- 
tion - MFF). The results are p resented in figure [9] a nd show the 
same behavior as described in Forero-Romero et al. ( 2009]). Fig- 
ure|9]supports the visual impression, gained by inspection of figure 
[5] that especially the VFF and MFF for voids strongly depend on 
the threshold value A, h . This shows that voids can serve as a sensi- 
tive monitor and indicator of the cosmic web I Forero-Romer oet al.| 
2009). Unfortunately, Fore ro-Romero et al.| l |2009] ) do not provide 
an explicit gauging of the A th values from simulations. Such a gaug- 
ing and hence a clear definition of the different cosmic web types 
would be very valuable for these types of analysis. 

Having now a representation of the web type posterior we can 
for example calculate the odds Oj(x k ) ratio given as: 



0,04) 



p{%m{N s k },A lll ) l - p (T,(J4)) 



(16) 



which tells us how much a specific web type is favored over all 
others. Here, the V (Tj(j4)) can be obtained by averaging over all 
voxels in the volume. In example, this permits us to build a simple 
structure type map m(x^) which can be used for visual analyses as 
presented in the next section. Such a map can be defined as: 



m(x k ) : 



T,04) 
undecided 



for 0,04) > O lh 
else 



(17) 



where 0,i, is an odds threshold usually chosen larger than unity. 
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Figure 8. Slices through the cosmic web posterior for the threshold values = 0.0,0.2,0.4, 1.0 (from top to bottom) for the four different web types. It is 
interesting to note, that sliced sheets look filamentary, while filaments piercing the slice appear as dots. 

The galaxies falling into a given M* range are plotted in the corre- 
sponding panel, with red (blue) galaxies being shown as red (blue) 
dots. Here we classify each galaxy into red or blue population using 
its g — r color and the luminosity-dependent divider as described in 
|Li et al.| ( |2006| > (see their Eq. 7 and Table 4). The observer on Earth 
is at the bottom right-hand comer of the slice where x = and 
y = Mpc. The density field with z = 302.16 ± 4.5 Mpc is pro- 
jected onto the x — y plane and is repeated in every panel. In figure 
^]the background density field is coded by the mean overdensity, 
ln(2 + (8j)), averaged for each pixel over the z range probed and the 
40,000 Hamiltonian samples. In figure [TT] we present a structure 
type map as defined in equation ( |17} by choosing an odds threshold 
of 0,i, = 1.55 and A,,, = 1.0. Each pixel of this map is color-coded 
by the web type which is determined by our classification algorithm 



6 GALAXY PROPERTIES VERSUS LSS 

In this section we present a preliminary, but intuitive examination 
of the correlations between the large-scale environment of galaxies 
and their physical properties. Here we consider two properties of 
galaxies: stellar mass M* and g — r color, and study how these are 
correlated with the overdensity 6 of the large-scale environment 
and its type, which is one of the four web types as classified as 
halo, filament, sheet, and void. We will come back to this topic in a 
separate paper by considering more physical properties of galaxies 
and performing more careful and quantitative analyses. 

Our results are shown in figures [T0| and 1 1 1 1 where we plot the 
galaxies in our sample with different stellar masses and g—r colors, 
on top of a slice through the ensemble mean density field. In each 
figure the four panels correspond to four M 4 intervals as indicated. 
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Figure 9. Volume and Mass filling factors as a function of A,/,. Continuous lines denote voids, dashed lines sheets, dotted lines filaments and dot-dashed lines 
halos. Especially the void VFF and MFF respond strongly to a change in A,/, making them a sensitive measure of the cosmic web ( Forero-Romero et al. 2009). 



described above, with types of halo, filament, sheet and void being 
plotted in black, light grey, dark grey and white respectively. 

Qualitatively the galaxies plotted in these figures appear to 
closely trace the underlying large-scale structure. This is not sur- 
prising because, by construction, the latter is reconstructed from the 
former. However, careful comparison of the different panels reveals 
a number of interesting trends. First, there exists a clear correlation 
between galaxy mass and the large-scale environment, regardless 
of how the environment is quantified. More massive galaxies tend 
to reside in regions with higher densities and more halo-like struc- 
tures. At the highest masses, almost all galaxies are confined within 
regions of high densities, or those of halo and filament types. As M* 
decreases, more and more galaxies are found in void-like regions. 
Second, at fixed stellar mass, galaxy color also appears to be corre- 
lated with large-scale environment. Red galaxies trace the density 
field more closely than blue galaxies. At all masses, the distribution 
of blue galaxies is more extended across the different types of struc- 
tures. At low masses, the blue population dominates the galaxies in 
void-like environment. 

These trends are consistent with recent similar studies by |Lee| 
|& Lee| ( [2008} and |Lee & Li] |2008) , which were based on much 
shallower galaxy samples (thus smaller volume), and also with the 
clustering analyses of Li et al. (2006). More work is needed in order 
to have more quantitative characterization of the relationships be- 
tween galaxy properties and the large-scale environment, and thus 
more powerful constrains on galaxy formation models. These re- 
sults, in turn, can be fed back to the large scale structure inference 
and help to improve our cosmographical description of the Uni- 
verse. 



7 SUMMARY AND CONCLUSION 

In this work we present the first application of the non-linear, 
non-Gaussian Bayesian large scale structure inference algorithm 
HADES to SDSS DR7 data. 

HADES is a numerically efficient implementation of a Hamil- 
tonian Markov Chain sampler, which performs sampling in ex- 
tremely high parameter spaces usually consisting of ~ 10 7 or 
more free parameters. In particular, HADES explores the lognor- 
mal Poissonian density posterior, which permits precision recovery 



of poorly sampled objects and density field inference deep into the 
non-linear regime ( |Jasche et al.|2009| >. 

The large scale structure inference was conducted on a cu- 
bic equidistant grid with sidelength of 750 Mpc consisting of 256 3 
voxels, yielding a grid resolution of about 3 Mpc. The large scale 
structure inference procedure correctly accounts for the survey ge- 
ometry, completeness and radial selection effects as well as for the 
correct treatment of Poissonian noise. The analysis yielded about 
3 TB of valuable scientific information in the form of full three 
dimensional density samples of the lognormal Poissonian density 
posterior. This set of density samples is thus a sampled represen- 
tation of the full non-Gaussian density posterior distribution and 
therefore encodes all observational systematics and statistical un- 
certainties. Hence, all uncertainties and systematics can seamlessly 
be propagated to any finally inferred quantity, by simply apply- 
ing the according inference procedure to the set of samples. In this 
fashion, the results permit us to make precise and quantitative state- 
ments about the large scale density field and any derived quantity. 

We stress that our Hamiltonian samples are not the result of a 
filtering procedure. A filter generally suppresses the power of the 
signal in low signal-to-noise regions and therefore does not yield 
a physical meaningful density, since it lacks power in poorly or 
unobserved regions. However, each Hamiltonian density sample 
represents a complete physical matter field realization conditional 
on the observations, in the sense that it possesses correct phys- 
ical power throughout the entire volume. Already visual inspec- 
tion of these density samples shows a homogeneous distribution of 
power throughout the entire volume. This fact was emphasized by 
the demonstration of power-spectra measured from these density 
samples, which show no sign of being affected by lack of power 
or artificial mode coupling nor do they show any sign of being af- 
fected by an adaptive smoothing kernel as would be expected for 
filter applications. It should be noted that this fact marks the crucial 
difference of our method to previous filter based density estimation 
procedures. 

In section [43] we estimated the ensemble mean and the ac- 
cording variance from the 40000 density samples. The estimated 
ensemble mean nicely depicts the cosmic web consisting of fil- 
aments, voids and clusters extracted from the SDSS data. It is 
clear, that the ensemble mean represents the mean estimated from 
the lognormal Poissonian posterior conditional on the SDSS data. 
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Figure 10. SDSS galaxies overplotted on the ensemble mean density field. The blue and red dots denote blue and red galaxies respectively, and the different 
panels depict galaxies in different stellar mass M* bins. 



Therefore, it encodes the observational uncertainties and system- 
atics. This can be seen by the fact, that the ensemble mean ap- 
proaches cosmic mean density in poorly or not observed regions. 
Further, we plotted the according variance, which demonstrates 
that the non-Gaussian behavior and structure of the Poissonian shot 
noise was correctly taken into account in our analysis. Especially, 
the expected correlation between high mean density and high vari- 
ance regions was clearly visible. We also estimated the cumulative 
probabilities for the density amplitude at each volume element, and 
demonstrated that the recovered density fields truly cover the broad 
range from linear to non-linear density amplitudes. 

To characterize the environment of our galaxy sample, but also 
to demonstrate the advantages of the Hamiltonian samples, we per- 
formed an example cosmic web type classification in section [5] In 



particular, we followed the dynamical cosmic web classification 
approach of|Hahn et al.H2007| > with the extensions proposed by 
|Forero-Romero et al.| i2009i. This procedure involves the calcu- 
lation of the cosmic deformation tensor and its eigenvalues. We 
demonstrated that this procedure can easily be applied to the set 
of samples, since they represent full physical matter field realiza- 
tions. As a byproduct of this procedure we estimated the ensemble 
mean for the three eigenvalues of the cosmic deformation tensor. 
Further, we classified the individual volume elements as one of the 
four different web types void, sheet, filament and halo. The classi- 
fication into four discrete web types enabled us to explicitely esti- 
mate the cosmic web posterior, which provides the probability of 
finding a specific web type at a given point in the volume condi- 
tional on the SDSS data. This result is especially appealing from a 
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Figure 11. Same as figure [To] but here the galaxies are overplotted on a structure type map as defined in section|5] The color coding denotes the web type: 
halo (black), filament (light grey), sheet (dark grey) and void (white). Regions, which are marked as undecided according to our criteria, equation \\7\ with 
O t h = 1.55, are colored yellow. 



Bayesian point of view, since it emphasizes the fact, that the result 
of a Bayesian method is a complete probability distribution rather 
than just a single estimate. Here we saw, that especially voids are 
a sensitive measure for the cosmic web. Of course, it is possible to 
repeat the cosmic web classification in a similar manner with any 
other classification procedure. 

In the following section [6] we presented a preliminary exam- 
ination of the correlation between the large-scale environment and 
physical properties of galaxies. In particular, we considered the 
stellar mass and g — r color of galaxies in relation to the density 
contrast 5. A qualitative analysis revealed that there exist correla- 



tion between these galaxy properties and the large scale structure. 
In particular, massive galaxies are more likely to be found in mas- 
sive structures, while low mass galaxies reside in void like struc- 
tures. The plots also demonstrate the different clustering behavior 
of red and blue galaxies. Also note, that these observed trends are 
consistent with previous works ( |Lee & Lee|2008[|Lee & Li|20O8[ 
Li et al. 2006 ). However, more work is required in order to provide 
quantitative statements. This will be done in a forthcoming publi- 
cation. 

The results presented in this work will be valuable for many 
subsequent scientific analyses of the dependence of galaxy proper- 
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ties on their cosmic environment. Herefore, particularly the Hamil- 
tonian samples allow for a more intuitive handling of observational 
data, since they can be understood as full matter field realizations 
or different multiverses consistent with our data of the Universe we 
live in. Beside providing quantitative characterizations of the large 
scale structure, the results also give us an intuitive understanding 
of the three dimensional matter distribution in our cosmic neigh- 
borhood. We intend to make our data publically available to the 
community. 

Future applications will also take into account non-linear bias 
models and peculiar velocity sampling procedures, to provide even 
more accurate density analyses. 

We hope that this work demonstrates the potential of Bayesian 
large scale structure inference and its contribution to current and 
future precision analyses of our Universe. 
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